Entry Name:  "TCS-Paneri-MC2"

VAST Challenge 2016
Mini-Challenge 2

 

 

Team Members:

Kaushal Paneri, kaushal.paneri@tcs.com     PRIMARY
Gunjan Sehgal, sehgal.gunjan@tcs.com

Aditeya Pandey, aditeya.pandey@tcs.com

Bindu Gupta, bindu.gupta2@tcs.com

Siddharth Verma, verma.sid@tcsin.com

Karamjit Singh, karamjit.singh@tcs.com

Geetika Sharma, geetika.s@tcs.com

Gautam Shroff, gautam.shroff@tcs.com

Student Team:  NO

 

Tools Used:

d3.js

HTML5

CSS

jQuery

R

Python

Parallel Coordinates (0.7.0)(https://syntagmatic.github.io/parallel-coordinates/)

 

Approximately how many hours were spent working on this submission in total?

150 hours

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2016 is complete? Yes

 

Video

https://youtu.be/YDJgy42WWcE

 

Questions

MC2.1What are the typical patterns visible in the prox card data? What does a typical day look like for GAStech employees?

Static Prox Data

Data Pre-processing

We discovered that prox ids are a combination of employee first and last names and used this to assign an employee name and department to each prox id. Six employees have multiple prox cards issued in their names. 11 employees do not appear in the prox card data at all. Of these 10 are from Security and 1 from Facilities.

In order to discover the typical patterns in prox card behaviour, we computed a 1D histogram for each prox ID with the total time spent by the employee in each floor zone. To track movement of employees within the building, we created a 2D histogram for each prox id for each day with 5 minute time bins on one axis and zones on the other. For each prox id we predicted an office based on the zone they spent maximum time in using the 1D histogram. A confidence score was computed for this prediction as the distance between the employee's histogram and an ideal histogram with peak bin count set to 1 and the rest to 0. We found 5 employees for whom the predicted and actual office zone didnt match.



































We also clustered the 1D histograms using BIRCH clustering over all employees and days and obtained 38 clusters.

From this data, we filtered on employee and obtained clusters for individual employees. We found that all except 15 employees had only one cluster for all the 14 days, implying that these employees had the same pattern of behaviour everyday. Visual analysis of the clusters lead us to discover that there are three work shifts at GASTech - a morning shift with 97 employees, an evening shift with 15 employees and a night shift with 2 employees. We describe patterns in each shift below.

Morning Shift Patterns

Employees in the morning shift arrive at around 7:30am and leave at around 5pm. These employees spent most of their time at their offices (5-6 hours), went to the Deli for lunch at around 12noon (40-60 minutes) and spent some time in other office zones, meetings or trainings.



Clusters in Prox ID data



























Typical Behaviour of an Employee























For 15 employees in the morning shift, we obtained more than 1 cluster each, indicating that they had different behaviours on different days.

When we group the clusters by department, we find that the large departments of Engineering and IT have only 4-5 different behaviours whereas Facilities has 9 different behaviours. These clusters differ by the shifts of the employees - we have one evening shift cluster in each department and one night shift cluster Facilities. There are multiple morning shift clusters which differ mainly in the location of the offices of the employees in them.



Department-wise Clusters in Prox Data

























To get further insights in to the behaviour of employees apart from the time they spend in their office zone, we modified the histograms by setting the peak bin to zero, assuming the peak to be office. On clustering again. Here again we find a only a few different behaviours for employees in Information Technology, Engineering, Security and Facilities. These clusters differ only in the amount of time spent at the Deli or in other office zones with training and conference rooms. A large number of individual behaviours are observed for employees in Administration, HR and Executives.

Behaviour outside office zone

























Evening Shift

In the evening shift, there are 7 employees from facilities, 3 from engineering and 5 from IT. They arrive at around 4:00 pm and leave around 12am. There is one cluster per department for evening shift employees and none of the employees have different behaviours on different days. Interestingly, there is no data for security personnel in the evening shift. It is possible that the security personnel who didnt issue prox cards work in the evening shift.

Night Shift

In the night shift there are only 2 employees from Facilities. They arrive at around 12am and leave at 7am. These two spend most their time in their office on Floor 1 and sometime in the Conference zone on Floor 1.

Mobile Prox Data

We discovered that Rosie moves around the building delivering mail from 9am-10am and 2pm-3pm everyday. Most employees are detected the same number of times everyday at the same location in the morning and afternoon. Three employees from facilities are detected very few times. They spend most of their time in the Loading zone on Floor 1. There are a 18 employees Rosie never detects. Of these 15 are in the evening shift and 2 are in the night shift. One employee Jeanette Frost is in the morning shift yet never detected by Rosie.

MC2.2Describe up to ten of the most interesting patterns you observe in the building data. Describe what is notable about the pattern and explain what you can about the significance of the pattern.

To discover patterns in sensor data we looked for correlation matrix for each sensor across different locations.

 

1. Equipment power is shows mid to high correlation between pairs of office zones. and equipment power are highly correlated across all floors.

 

2. Mass flow rate and Damper position are highly correlated across all floors.

 

3. Thermostat cooling setpoint shows high correlation across all floor zones except floor 3 zone 1. The time-series of Floor 3 Zone 1 Thermostat cooling setpoint below verifies the unusual behaviour.

 

Further, we look for the Sensor patterns by finding locations at which sensor behaves same alongwith the number of days pattern exist. We use clustering and frequent itemset mining for it. For each day, we first cluster the time series of each sensor S at different location . It gives us the clusters containing locations at which S behaving same. We repeat it for all 14 days. Therefore, we get clusters C_ij ={l_k} containing locations at which sensors S behaving same, where i is cluster id and j=1 to 14.

Item: Location
Transaction: Day-wise cluster of locations
Frequent Itemset: Set of locations with similar time-series

 

Next, we find frequent itemsets of locations using R package 'arules'. As a result, for each sensor we get top frequent itemset in terms of support, where each itemset conatins locations at which S behaves same and support repesent the number of days(in percentage) that pattern exist.

 

We are using this visualization to find the patterns in the building data. We are showing the sensors that are behaving similar at what locations and for how many days.We have a heatmap on the right with x axis mapped to sensors, yaxis is mapped to frequent item sets containing locations. Colour is mapped to the support of the frequent item set.The left panel contains the three floor plans for the three floors in the building.On clicking on bin we can see the support associated with it in the tooltip and on clicking it we can see the exact locations on the floor plans that are having similar behaviour.

4.The behaviour of sensor Thermostat cooling setpoint shows a behaviour which is similar at large number of locations across the building with high support.

5.Supply inlet temperature for itemset 7 shows similar behaviour only for one location.

MC2.3Describe up to ten notable anomalies or unusual events you see in the data. Describe when and where the event or anomaly occurs and describe why it is notable. If you have more than ten anomalies to report, prioritize those anomalies that are most likely to represent a danger or serious issue for building operation.

Sensor Anomalies

In order to discover sensor anomalies we use a Gaussian model for each sensor. We compute the mean and standard deviation for all sensors at all locations for each hour. If the value of a sensor reading is outside the range mean +/- std deviation we tag it as an anomaly. Using this model we get daily counts of anomalies for each sensor.

Our sensor anomaly tool enables visual analysis of sensor anomaly data. The top left pane allows selection of a floor and the top right pane shows a calendar heatmap with the total number of sensor anomalies on a day mapped to colour. The bottom pane shows glyphs for the sensors on a floor. On clicking on a day in the calendar, the floor plan updates to show the floor wise distribution of anomalies for the day. Additionally a radial chart on the bottom right shows the hourly distribution of anomalies for the chosen day. The sizes of the sensor glyphs update to show the their individual anomaly counts. Clicking on a zone shows the anomalies for all sensors in that zone.

1. We found that the maximum sensor anomalies occurred on June 7 and 8 on all the floors with floor 2 being the most anomalous.

2. The most anomalous sensors on Floors 2 and 1 were Thermostat cooling setpoint and Thermostat Temperature.

3. When we further visualized the time-series of Thermostat cooling setpoint with HVAC Electric Demand Power, we saw 2 spikes at 7th and 8th. Similarly we saw a very different pattern in these days.

4. Sensor anomalies occurred on the first floor on both weekends. On the June 5, the zone 3 the main entrance and zone 5 the Conferece room show anomalies in Return Outlet CO2 Concetration and Thermostat heating setpoint value. On June 6, most anomalies are seen in Zone 7, consisting of server room, offices and loading area in both Thermostat cooling and heating setpoints. The same is observed in Zone 4, which has offices.

On the June 11, Reheat coil power and supply inlet temperature are anomaloue in the zone 7 and the Deli. Also, Hazium sensor is anomalous in the Corridor zone 8A. On June 12, except for Hazium is normal but the other two sensors still show anomalies in the same zones. Interestingly the hours in which maximum anomalies occur are same on both the days.

5. On June 11 and 12, on Floor 2 as well, Reheat coil power and Hazium sensors show anomalies. Floor 3 doesnt show significant anomalies on June 11 and 12.

Prox Anomalies

1. We found that 10 security personnel and 1 employee from Facilities never issued a prox id. This indicates that (a) they never came to the building, or (b) they were in the building without a prox id, or (c) they used another employee's prox id.

2. Six employees had multiple prox ids issued in their names. Of these 5 had 2 ids and 1 had 5 ids. Further, 5 out of these employees had the same behaviour even with multiple ids and the ids were not used on the same day, therefore we may conclude that they were the same person. We describe the 6 person next.

3. Patrick Young, from Facilities, issued 2 ids. Looking at clusters for pyoung001, we find that the largest cluster shows maximum time in one location as 3 minutes on multiple days. From the employee heatmap view we come to know that pyoung001 (below) is detected in the building for about 5 minutes most of which are spent in the server room on floor 3. Comparing the employee heatmap views for pyoung001 on May31 and pyoung002 from June 2-13, we conclude that the first id was used by the real Patrick Young on May 31 and partially on June 1. From June 2 onwards, Patrick uses id pyoung002. An impostor uses the id pyoung001 on June 2, 8, 10. On each of these days he is detected in the building for about 5 mins in the server room. Could the impostor be Irene Nant from Facilities who never issued a prox id?

4. Jeanette Frost comes to office from 8am to 1pm everyday, stays on the first floor, yet is never detected by Rosie.

 

5. A Pinckney from Engineering department has his office on Floor 2 zone 1 , but he is mostly found in the on zone 7 of the same floor.

 

MC2.4 –– Describe up to five observed relationships between the proximity card data and building data elements. If you find a causal relationship (for example, a building event or condition leading to personnel behavior changes or personnel activity leading to building operations changes),  describe your discovered cause and effect, the evidence you found to support it, and your level of confidence in your assessment of the relationship.

We did causal analysis between Prox data and building sensor data by combining both dataset together as follows: As prox zones are generally bigger than energy zones, for each prox zone, we include all sensors from overlapping energy zones or those contained within it. We generated an additional feature per prox zone. This is the number of people in the prox zone at each time instant t. Therefore, for each prox zone FZ we have sensors readings of S1, S2,...S_k and number of prox in FZ at time t.

Approach:We have calculated the immediate change in employee count and all the sensors at a particular location as

Delta = Count/sensor reading at time t+1 - Count/sensor reading at time t.

We further discretized delta data into seven bins using Gaussian discretization with mean zero and sigma with following equation

STD = (MaximumDistancefrom0) / 3

The bins are NH – Negative High,NM – Negative Medium, NL – Negative Low,Z – Zero , PL – Positive Low, PM – Positive Medium, PH – Positive High.

Instead of keeping the sensor reading delta at time t and employee count delta at time t in the same row, we map the employee count delta at time t, with the sensor reading delta of time (t+1), as a result, a row indicates causal relationship between employee count delta and sensor reading delta. We have calculated top rules with Number of people as antecedent and the next corresponding change in sensor as consequent and filtered all the antecedents Zero and low changes to get the extreme rules.

We have used a Glyph based visualization for analyzing Rules obtained from the frequent pattern mining over the prepared datasets. Our visual analysis aims to concisely present the statistics of the rules along with it's pictorial description for the contextual understanding. We juxtapose the visualization with the floor plan to recognize the area where the phenomenon was observed.

The rules are listed in the order of their confidence . Glyph sizes are mapped to the discretization done in our dataset i.e maximum size is given to "Positive High Discretization" and minimum size is given to "Negative High Discretization".



























Analyzing the rules obtained in the above figure we observe that Meeting Room on floor 1 (highlighted in orange in the above figure) had large number of extreme behaviours.

  1. Rule1: {Emp_Count_NH}->{Equipment_Power_NH} : A large decrease in number of employees leads to a large decrease in the equipment power sensor reading. This is expected when people leave the conference room when a meeting ends.
  2. Rule2: {Emp_Count_NH}->{Lights_Power_NH} : A large decrease in number of employees leads to a large decrease in the light power sensor reading.
  3. Rule3: {Emp_Count_NH}->{Return_Outlet_CO2_PM} : A large decrease in number of employees leads to a medium increase in the Return_Outlet_CO2.When people start moving out of the floor zone at that time the CO2 sensor shows a positive spike, this could possibly be becuase during a meeting the room may be closed and amount of CO2 in circulation is constant and when people start leaving the Co2 concentration suddenly increases.
  4. Rule4: {Emp_Count_NH}->{Supply_Inlet_Mass_Flow_rate_NH} : We observe that when employees move out of this floor zone , at that time Supply Inlet sensor is also decreasing.
  5. Rule5: {Emp_Count_NH}->{Thermostat_Temp_NH} : We have observed that when employees move out of this floor zone , at that time the sensor reading for temperature falls sharply.
  6. Rule6: {Emp_Count_PH}->{Thermostat_Temp_NH} : We also observe that due to large movement of people inside this zone also leads to a large fall in sensor reading for temperature.













Analyzing the rules obtained in the above figure we observe that Conference on floor 1 had following behaviours.

  1. Rule1: {Emp_Count_PM}->{Lights_Power_PH} : A large increase in number of employees leads to a large increase in the light power sensor reading. This is expected when people come the conference room as lights would be turned on.
  2. Rule2: {Emp_Count_NH}->(Equipment_Power_NH} : A large decrease in number of employees leads to a large decrease in the light equipment power reading sensor reading.




Conclusion

In conclusion, there is clearly a problem with the building's HVAC system. Most anomalies reported are in systems that maintain a comfortable temperature within the building. The fact that these anomalies occur on certain floors, even on weekends, when there are very few employees in the building further supports this conclusion. While majority of the employees follow a regular schedule everyday, an impostor visits sensitive locations in the building on multiple days.